On of the largest caveats in RNA sequencing, and extracellular RNA sequencing especially, is DNA contamination. Luckily, we can determine the extent of this problem by investigation the metrics of our output. None of these metrics on their own can determine DNA contamination but, together, they can indicate an issue.
Many RNA sequencing library preparation protocols use a stranded approach. This means that the reads generated from a transcript map specifically to either the coding or template strand. The SMARTer® Stranded Total RNA-Seq Kit - Pico Input Mammalian, for example, is a stranded protocol. When sequencing DNA, both strands will be sequenced and your stranded library preparation becomes unstranded. Ideally, the fraction of “reads mapped to template strand” is close to 1. The closer this value gets to 0.5, the more likely your data contains some DNA data.
Since the kits used here are total RNA sequencing kits, we at least expect some of the reads to map to splice junctions. In case of DNA, there should be little to no reads mapping to splice junctions.
The duplication rate represents the number of unique RNA molecules that were sequenced. Because of the low RNA concentration used in this kit, the percentage of duplicates can be quite high. However, a sample strongly deviating from others in terms of duplication rate, might indicate an issue.
The spike fraction indicates the percentage of counts assigned to the spike-in RNAs. Ideally, the spikes take up a portion of the data large enough to be able to perform downsteam analyses, such as normalization. However, exceedingly high spike fractions hoard an excessive amount of data, leaving little for true biological reads.